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ABSTRACT 

Let  f  be  a  density  function  defined  on  the  closed  interval  [a,  b]. 
The  k-means  partition  of  this  interval  is  defined  to  be  the  k-partition 
with  the  smallest  within  cluster  sum  of  squares.  The  properties  of  this 
k-raeans  partition  when  k  becomes  large  will  be  obtained  in  this  paper. 
The  results  suggest  that  the  k-means  clustering  procedure  can  be  used  to 
construct  a  variable-cell  histogram  estimate  of  f  using  a  sample  of  ob- 
servations taken  from  f. 
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1 .   INTRODUCTION 

Let  the  univariate  observations   X,  ,X~,  . . . ,  X,,  be  sampled 
from  a  distribution   F  with  density  function   F.   In  cluster 
analysis,  the  k-means  clustering  method  (see  Hartigan  (1975), 
Chapter  4)  is  often  used  to  partition  the  sample  of  N  observa- 
tions into   k   clusters  with  means   U, ,  ...,  U,  .   The  resultant 

1        k 

clusters  satisfy  the  property  that  no  movement  of  an  observation 
from  one  cluster  to  another  reduces  the  sample  within  cluster 
sum  of  squares 


N     .  „ 

N    .  ,  l£l<k  '  '   1     J  '  ' 

1=1    •^~  -^ 


For  these  sample  k-means  clusters,  if   I.   is  used  to  denote  the 
interval  containing  all  points  in   R   closer  to   U.   than  to  any 
other  cluster  means,  then   (I-i  >  •••>  I^,^   defines  a  k-partition  of 
the  sampled  space.   The  corresponding  k-means  partition  in  the 
population   F   is  defined  by  the  k-population  means   m.  ,  ...,  m,  , 
which  are  selected  in  such  a  way  that  the  within  cluster  (or 
interval)  sum  of  squares 


WSS  =  //"^^   M  X  -  m.  I  1^  dF 


is  minimized. 
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The  k-means  method  has  been  widely  used  in  clustering 
applications  (see  Blashfield  and  Aldenderfer,  1978),  and  the 
efficient  computational  algorithm  given  in  Hartigan  and  Wong 
(1979)  has  been  included  in  the  multivariate  programs  BMDPKM  of 
the  BMDP  statistical  package.   The  properties  of  sample  k-means 
clusters  have  also  been  studied  by  several  investigators.   In 
Fisher  (1958),  and  Fisher  and  Van  Ness  (1971),  it  is  shown  that 
k-means  clusters  are  convex,  i.e.,  if  an  observation  is  a 
weighted  average  of  observations  in  a  cluster,  the  observation 
is  also  in  the  cluster.   And  the  asymptotic  convergence  (as 
N  ->-  °°)  of  the  sample  k-means  clusters  to  the  population  k-means 
cluster  for  fixed  number  of  clusters   k  has  been  studied  by 
MacQueen  (1967),  Hartigan  (1978),  and  Pollard  (1981),  in  which 
conditions  that  ensure  the  almost  sure  convergence  of  the  set 
of  means  of  the  k-means  clusters  can  be  found.   However,  little 
work  have  been  done  in  examining  the  properties  of  population 
k-means  clusters,  especially  when  k  becomes  large.   In 
Dalenius  (1951),  it  is  shown  that  the  cut-point  between  neighbor- 
ing population  clusters  is  the  average  of  the  means  in  the 
clusters,  and  in  Cox  (1957),  the  cut-points  for  the  k-means 
clusters  in  the  standard  normal  distribution  are  given  for 
k  =  1   ''       6 

In  this  paper,  the  asymptotic  properties  (as  k  becomes 
large)  of  the  population  k-means  clusters  in  one  dimension  are 
obtained.   It  is  shown  in  Section  2  that  the  optimal  population 
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partition  is  such  that  the  within  cluster  sums  of  squares  of  the 
k   cluster  intervals  are  asymptotically  equal,  and  that  the  sizes 
of  the  cluster  intervals  are  inversely  proportional  to  the  one- 
third  power  of  the  underlying  density  at  the  midpoints  of  the 
intervals.   The  implications  of  these  results  are  discussed  in 
Section  3. 


2.   ASYIIPTOTIC  PROPERTIES  OF  POPULATION  K-MEANS  CLUSTERS 


Let   f(x)   be  a  density  function  defined  on  the  interval 
[a,b],   and  denote  the   ith   derivative  of   f  at   x  by   f    (x). 
Let  the  k-partition  of  [a,b]  specified  by  the  k-1  cutpoints 
a  <  y,  <  y^  <  .  . .  <  y   i  *■  t>   be  the  k-partition  with  the  smallest 
within  cluster  sum  of  squares 


k         ^   ^i  2 

WSS  =   Z  WSS.  =   E   /      (x  -  m.)   f(x)  dx, 

i=l    ^    i=l   ^i-1       ^ 


where   a  =  y  ,  b  =  y,  ,  and 


y  •  y- 

m.  =  /  ^   X  f(x)  dx  /  /  ^   f(x)dx. 
"    ^i-1  ^i-l 


In  this  section,  we  will  describe  the  properties  of  this  k-means 
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partition  of  a  finite  interval  [a,b]  as  the  number  of  cluster 
intervals  (or  cells)  becomes  large. 

Theorem;   Let   f(x)   denote  a  density  function  on  the  interval 

[a,b].   And  let  a  =  y^j^  <  y^^  <   ...  <  y(k_i)k  <  ^kk  =  ^  ^^ 
the  cutpoints  specifying  the  k-means  partition  of  [a,b].   If   f 
is  positive  and  has  four  bounded  derivatives  in  [a,b],  then  we 
have  uniformly  in   1  i  i  ^  k. 


k  e.,  f./^^  -  /^  [f(x)]^^^dx  (2.1) 

ik   ik       a 


^  Pik  ^k''^'  -  ^a  [fWl'^'^i-  (2-2) 


k^SS.,  ^   [/^  [f(x)  ]■'-'' ^dx]^/12  (2.3) 


as   k 


«here    e.^  =  y.^  -  y^.,^)^^ 


f .,  =  f  (1/2  y.,  +  1/2  v,.  ...  ) 
ik  ^ik        (i-l)k 


p   =  /^^     f(x)  dx 
^(i-l)k 

^ik  ^ik  2 

and      WSS.,  =   f   ^^  [x  -  /  ^      x  f(x)dx/p,,  ]^  f(x)dx. 

^^    ^(i-l)k       ^(i-l)k  ^^ 
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(The  theorem  states  that,  for  large  k,   the  within  cluster  sums 
of  squares  of  the   k   intervals  are  nearly  equal;   it  follows 

that  the  length  of  the  interval  containing  a  point  x  of 

-1/3 
density   f(x)   is  proportional  to   f(x)     .) 

Proof :   The  proof  is  in  four  parts. 


(I)  The  k-partition  of  [a,b]  consisting  of  k  equal  intervals 
has  a  within  cluster  sum  of  squares  of  order  k  ~ ;  the  contri- 
bution from  the   ith   interval  to  the  optimal  within  cluster  sum 

3  ^3-2 

of  squares  is  of  order  e.,  .   Therefore,    T.     e .,   =  0  (k   ), 

-2/3     ^^^ 
which  implies  that   sup  e.,  =  0  (k    ).    To  avoid  complexity  of 

i 

notation,  the   k's   indexing  partition  will  be  dropped. 


(II)   In  this  part  of  the  proof,  it  \^n.ll   be  shown  that  lengths  of 
neighboring  clusters  are  of  the  same  order  of  magnitude.   Let 

m.   be  the  mean  of  the   ith   interval.   Then 

1 


y  •         y- 

m.  =  /  ^   X  f(x)dx//  ^    f(x)dx. 

1    y.  1         y.  T 
1-1  1-1 


Consider  any  two  neighboring  intervals   e.   and  e.  ,.   By  the 
optimality  of  the  partition,  as  is  shown  in  Dalenius  (1951), 
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y .   -  m.    =  m. , ^    -  y . 
2  J  J+1  J 


Thus ,  e .    >   y .    -  m . 

J  J  J 


m.,^    -   V. 
J+1  J 


=   /   ^^■'-   X    f(x)    dx//'^^-'    f(x)    dx   -   y. 
V .  y .  J 


e  .  e . 

=  /  ^"""^  X  f  (x  +  y.)    dx//  ^"'■^   f(x  -H/.)    dx 


> 


M 

1  1  u  >/  3.nf      ^,    ,  , 
T   •  TT"  •    S-,T     »   where   M,    =            ,     f(x)    and 

2  M  1+1    '  1        aixib 

u  -" 


„       _       sub        ef     \ 

M     =  ,     r (x) . 


M 
1  1 

Sinalarly,      e..^    ^  tt   .      tt-   .    e.. 

j+l    2    M     1 

(III)   We  will  now  establish  the  asymptotic  relationship  between 

the  lengths  of  neighboring  intervals.   Denote  the  center  of  the 

ith   interval  by   C.  (i  =  1,  . . . ,  k) .   It  follows  that 

C.  =  y • _i  +  —  e..   Using  the  Taylor  series  expansion,  we  have,  for 

any   x   in  the   ith   interval,   f(x)  =  f(C.)  +(x-C.)  •  f^''"\c.) 

+  i  (x-Cp2  .  f(2)(c  .  ^i(x-Cj3  .  f^^\c')  +^  (x-C,)^  .  f(^') 
^1  101  iz4i 

(q  ),   where   q   is  between  x  and   C.   Since  the  first  four 
^x  ^x  1 

derivatives  are  bounded  on  [a,b],  it  follows  from  the  above 
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series  expansion  that  we  have  simultaneously  for  all   l<i<k. 


p^  =  /  ^   f(x)dx  =  e.[f(C.)  +^  f^^^C.)e.2  +  0(e.S],    (2.4) 
i-1 


and 


^i  1  (I)  "^    1      (2) 

f  xf(x)  dx  =  e. [C.f(C.)  +  ^  f^^^  (C.)e.^  +  2^  ^i  ^    ^^ .) . 

i-1 

e.^  +  0(e.^)]  (2.5) 


(Note  that  the  universal  bound  contained  in  the   0   term  depends 
only  on  the  various  bounds  of  the  derivatives  of   f   and  is 
independent  of   i.) 
Therefore, 

1    f^'^^V    2 
m.  =  f\        X  f (x)dx/p.  =  C.  +  Y2  •   f(c.)^  ^  + 

0(e.^)  (2.6) 

Since  the  partition  is  optimal,  we  have  simultaneously  for  all 

l<i<k,   (C  .  +  —  e  . )  -  m.  =  m.  , ,  -  (C  .  , ,  -  tt  e  _  , )  ,   which  when 
1   2   1     1    1+1     1+1   2   1+1 

combined  with  (2.6)  gives 

^i  -  6  •   f(C.)   ^  ■"  °^^i  ^  =  ^+1  ^  6    f(C.^J    ^+1  ^ 


i+1' 


°(4+i)- 
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Since  it  has  been  shown  in  part  [II]  that   e.   and  e.,^   are  of 

"^  1        1+1 

the  same  order  of  magnitude,  we  have  for  all   l<i<k., 

^+1  ■"  6  ■   f(C.      ^+1  =  ^  -  6  •  -TiTT     ^  ^  °^^i^- 


It  follows  that 


.(1),„  .      f(l)rr   ^    o2 

^+1  =  ^  ^1  -  6  ^  f(C.)   ^^  f(C.,J ^^°(^)^' 

1  1+1        1 


After  some  Taylor  series  manipulation,  we  have 


e.^^/e.  =  [f(C.^^)/f(C.)]  ^/^  •  [1  +  0(e^)].  (2.7) 


Moreover,  since  it  can  be  shown  from  (2.4),  (2.5),  and  (2.6)  that 


WSS.  =  /  "•    (x-m,)^  f(x)dx  =  ^  f(C.)eJ[l  +  0(ej)], 
1    y .  ,     3-  12    1  1        1 

1-1 


and  from  (2.4),   p.  =  f(C.)  e.  [1  +  0(e.)],  we  obtain  from  2.7 
that 


WSS. ^^ /WSS.  =  1  +  0(ej)  (2.8) 


and    p.^^/p.  =  [f(C.^^)/f(C.)]-''Ml  +  0(ej)].  (2.9) 
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[IV]   Finally,  we  will  now  establish  the  relationship  between  e. 
and  e.   for  any   l^i<j^k..   It  follows  from  (2.7)  that  for  any 
pair  of  values  of   l^i^j^k, 


e./e^  =  [f(C.)/f(Cj)]  ^^^{[1+O(ep][l+O(e^^^)]---[1+O(e^)]}. 


But  it  has  been  shown  in  part  [I]  that    .   e.  =  0(k    ). 
Hence,   e./e^  =  [f (C . ) /f  (C . ) ]"^^^  [140(k"^/^) ]^ 

=  [f(C.)/f(C.)]"^^^  [l+0(k"^/^^] 

for  all   l-^i<ji^k,  which  implies  that  we  have  uniformly  in 
l<i<j<k, 


(e./e^)  •  [f(C^)/f(Cj)]^^^  ->  1  as  k  -> 


Since    Z      e.  fCC.)-*"^-^  ^  /^  f(x)^'^^  dx,   (2.1)  follows. 
.,11        a  »   \   / 

1=1 


Similarly,  from  (2.8)  and  (2.9),  we  have  uniformly  in  l^i<j^k. 

WSS^/WSS.  ^  1,  and  (p^/p  . )  •  [f  (C^) /f  (C  .)  ]~^''^  ^  1   as 
k  -*■  °°,   which  in  turn  gives  (2.3)  and  (2.2)  respectively.   And 
the  theorem  is  proved. 
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3.   DISCUSSION 

In  this  paper,  our  effort  is  directed  towards  obtaining  the 
properties  of  univariate  population  k-means  clusters  when  k 
becomes  large.   The  properties  given  in  Section  2  indicate  that 
the  lengths  of  the  population  k-means  intervals  (or  cells)  are 
adaptive  to  the  underlying  density  function:   the  intervals  are 
large  when  the  density  is  low,  while  the  intervals  are  small 
where  the  density  is  high.   This  result  suggests  that  the  k-means 
clustering  procedure  can  be  used  to  construct  a  variable-cell 
histogram  estimate  of  an  underlying  density  using  a  sample  of 
observations  taken  from  that  density  (see  Wong,  1980).   Such  a 
density  estimation  method  is  of  interest  because  it  makes  use  of 
the  computationally  efficient  k-means  clustering  procedure 
(Hartigan  and  Wong,  1979)  which  is  also  applicable  to  multi- 
variate data. 
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